Experiments in spoken document retrieval using phoneme n-grams

نویسندگان

  • Corinna Ng
  • Ross Wilkinson
  • Justin Zobel
چکیده

In spoken document retrieval, speech recognition is applied to a collection to obtain either words or subword units, such as phonemes, that can be matched against queries. We have explored retrieval based on phoneme n-grams. The use of phonemes addresses the out-of-vocabulary problem, while use of n-grams allows approximate matching on inaccurate phoneme transcriptions. Our experiments explored the utility of word boundary information, stop word elimination, query expansion, varying the length of phoneme sequences to be matched and, various combinations of n-grams of different lengths. Given word-based recognition, we can match queries to speech using a phoneme representation of the words, permitting us to test whether it was the recognition or the matching process that was most crucial to retrieval performance. Our experiments show that there is some deterioration in effectiveness, but the particular form of matching is less vital if the sequence of phonemes was correct. When phone sequences are recognised directly, with higher error rates than for words, it was more important to select a good matching approach. Varying gram length trades precision against recall; combination of n-grams of different lengths, in particular 3-grams and 4-grams, can improve retrieval. Overall, phoneme-based retrieval is not as effective as word-based retrieval, but is sufficient for situations in which word-based retrieval is either impractical or undesirable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ETH TREC-6: Routing, Chinese, Cross-Language and Spoken Document Retrieval

ETH Zurich's participation in TREC-6 consists of experiments in the main routing task, both manual and automatic runs in the Chinese retrieval track, cross-language retrieval in each of German, French and En-glish as part of the new cross-language retrieval track, and experiments in speech recognition and retrieval under the new spoken document retrieval track. This year our routing experiments...

متن کامل

JHU/APL Experiments in Tokenization and Non-Word Translation

In the past we have conducted experiments that investigate the benefits and peculiarities attendant to alternative methods for tokenization, particularly overlapping character n-grams. This year we continued this line of work and report new findings reaffirming that the judicious use of n-grams can lead to performance surpassing that of word-based tokenization. In particular we examined: the re...

متن کامل

Phonetic confusion based document expansion for spoken document retrieval

This paper presents a phone-based approach of spoken document retrieval (SDR), developed in the framework of the emerging MPEG-7 standard. We describe an indexing and retrieval system that uses phonetic information only. The retrieval method is based on the vector space IR model, using phone N-grams as indexing terms. We propose a technique to expand the representation of documents by means of ...

متن کامل

Multi-scale-audio indexing for translingual spoken document retrieval

MEI (Mandarin-English Information) is an English-Chinese crosslingual spoken document retrieval (CL-SDR) system developed during the Johns Hopkins University Summer Workshop 2000. We integrate speech recognition, machine translation, and information retrieval technologies to perform CL-SDR. MEI advocates a multi-scale paradigm, where both Chinese words and subwords (characters and syllables) ar...

متن کامل

Factors affecting speech retrieval

Collections of speech documents can be searched using speech retrieval, in which the documents are processed by a speech recogniser to give text that can be searched by standard text retrieval techniques. Recognition is the translation of speech signals into either words or subword units such as phonemes. We investigated the use of a phoneme-based recogniser to obtain phoneme sequences. We foun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 32  شماره 

صفحات  -

تاریخ انتشار 2000